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Compounds of low lattice thermal conductivity (LTC) are essential for seeking thermoelectric 
materials with high conversion efficiency. Some strategies have been used to decrease LTC. However, 
such trials have yielded successes only within a limited exploration space. Here we report the virtual 
screening of a library containing 54,779 compounds. Our strategy is to search the library through 
Bayesian optimization using for the initial data the LTC obtained from first-principles anharmonic 
lattice dynamics calculations for a set of 101 compounds. We discovered 221 materials with very 
low LTC. Two of them have even an electronic band gap < 1 eV, what makes them exceptional 
candidates for thermoelectric applications. In addition to those newly discovered thermoelectric 
materials, the present strategy is believed to be powerful for many other applications in which 
chemistry of materials are required to be optimized. 


Thermoelectric generators are essential for utilizing 
otherwise waste heat. Because of the technological im¬ 
portance, researchers have been seeking materials with 
high conversion efficiency for decades p}[4]. Compounds 
of low lattice thermal conductivity (LTC) are essential 
for this purpose. Different strategies have been used 
to decrease LTC. Recently, high throughput screening 
(HTS) of materials using materials database constructed 
by first principles calculations has been recognized as 
an efficient tool for accelerated materials discovery [SH^. 
Thanks to the recent progress of computational power 
and techniques, a large set of first principles calculations 
can be performed with the accuracy comparable to ex¬ 
periments. This is a straightforward strategy when both 
of the following conditions are satisfied: 1) the target 
physical property can be accurately computed by first 
principles methods. 2) The exploration space is well de¬ 
fined and not too large to compute the target physical 
property exhaustively in the space. 

In order to evaluate LTC with the accuracy compara¬ 
ble to experimental data, however, we need to develop 
a method that is far beyond the ordinary density func¬ 
tional theory (DFT) calculations. Since we need to treat 
multiple interactions among phonons, or anharmonic lat¬ 
tice dynamics, the computational cost is many orders of 
magnitudes higher than the ordinary DFT calculations. 
Such expensive calculations are practically possible only 
for a small number of simple compounds. HTS of a large 
DFT database of LTC is not a realistic approach un¬ 
less the exploration space is narrowly confined. In the 
year 2014, Carrete and coworkers concentrated their ef¬ 
forts to search low LTC materials within half-Heusler 
compounds 10 . They made HTS of wide variety of half- 


Heusler compounds by examination of thermodynamical 
stability via DFT results. Then LTC was estimated ei¬ 
ther by full first principles calculations or by a machine¬ 
learning algorithm for a selected small number of com¬ 
pounds. HTS of low LTC using a quasiharmonic Debye 
model was also reported in 2014[lT]. Efficient prediction 
of LTC through compressive sensing of lattice dynam¬ 
ics was recently demonstrated [L2] . Development of such 
new methods would bring accelerated discovery of new 
materials in the future. 

In the present study, we do not want to restrict the 
exploration space by empirical knowledge, for example, 
by crystal structure. We firstly evaluated LTC of 101 
compounds with three prototype structures, i.e., rock- 
salt, zincblende and wurtzite-type structures, by first- 
principles anharmonic lattice dynamics calculations and 
solving Boltzmann transport equation with the single¬ 
mode relaxation-time approximation[T3l [14] . Then the 
results are used to construct a model for making “virtual 
screening” of many compounds in a library with a diver¬ 
sity of structures and chemical compositions employing 
Bayesian optimization procedure. For the Bayesian opti¬ 
mization, predictors are determined by kriging method to 
find the lowest LTC compound among the 101 first prin¬ 
ciples data. The highly ranked compounds are supplied 
to first principles LTC calculations to verify the result of 
the screening. 

Computational procedure of LTC is described in de¬ 
tail elsewhere p~3j. LTCs were calculated from phonon 
lifetimes, group velocities, and mode-heat capacities solv¬ 
ing the phonon Boltzmann transport equation within the 
relaxation time approximation. The phonon properties 
were calculated from the force constants. We employed 
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first-principles calculation to obtain second-order force 
constants (FC2) and third-order force constants (FC3) 
with the super cell and finite displacement approaches. 
Phonopy code was used for these phonon calculations [14]. 
Finite displacements of 0.03 A were systematically in¬ 
troduced to perfect supercells to fill up all elements of 
force constant tensor elements among atoms in the su¬ 
percells. The Brillouin zone integration for the phonon 
lifetime calculation was performed by the linear tetrahe¬ 
dron method. 

For the first principles calculations, we employed 
the plane-wave basis projector augmented wave (PAW) 
method p~5] in the framework of DFT and the generalized 
gradient approximation of the Perdew-Burke-Ernzerhof 
(PBE) form [16] as implemented in the VASP codepTTT- 
[19]. Much more attention for the convergence of DFT 
calculations should be paid in the phonon calculations 
as compared to the ordinary first principles calculations 
with respect to the k-point mesh, plane wave energy cut¬ 
off and tolerances of energy, residual force and stress. 
The size of the supercell was chosen by observing the 
convergence of phonon properties by changing the su¬ 
percell size. Low LTC crystals are generally more an- 
harmonic and the atomic interaction range is considered 
relatively large. LTC calculations of the highly ranked 
compounds required larger supercells than those for or¬ 
dinary crystals with smaller anharmonicity. The plane 
wave energy cutoff was chosen to be at least 20% higher 
than the recommended values in the PAW dataset. To¬ 
tal energies were minimized until the energy convergences 
became less than 10 -8 eV. 

Results of first principles LTC of 101 compounds are 
shown with crystalline volume per atom, V, and den¬ 
sity, p, in Figs, [I] (a) and (b). Among 101 com¬ 
pounds, PbSe with the rocksalt structure show the lowest 
LTC, 0.9 W/mK (@300 K). It is in the similar trend as 
the recent report showing low LTC for lead- and tin- 
chalcogenides [2QH23] . The computed results are com¬ 
pared with available experimental data in Fig. |T] (c). 
Satisfactory agreements between experimental and com¬ 
puted results are evident in Fig. [l] (c), demonstrating 
the usefulness of the first principles LTC data for further 
studies. A phenomenological relationship has been pro¬ 
posed that log kl is proportional to log R [24]. Although 
qualitative correlation can be seen between our LTC and 
V, it is difficult to predict LTC quantitatively, hence to 
discover new compounds with low LTC, only from the 
phenomenological relationship. It can be noted that the 
dependence on V is remarkably different between rocksalt 
type and zincblende or wurtzite type compounds, while 
zincblende and wurtzite type compounds show similar 
LTC when the chemical compositions are the same. 

The 101 first principles LTC data are then used to 
make a model for the prediction of LTC of compounds 
within a library on the basis of the Bayesian optimization. 
For the purpose of the prediction, it is preferable to select 


“good” predictors. Our rule of thumb is as follows: 1) 
whenever experts’ knowledge is available as a physical or 
phenomenological rule, it should be examined as the first 
step. 2) Predictors may be better included in a library or 
those easily made by combining the physical quantities 
in a library. Alternatively, the predictors may be easily 
computed by DFT calculations. 3) High efficiency for the 
Bayesian optimization procedure needs to be examined. 

On the basis of these ideas, we firstly determine pre¬ 
dictors for the Bayesian optimization procedure to find 
the lowest LTC compound among the 101 first princi¬ 
ples LTC data. We adopt kriging method based on the 
Gaussian process regression (GPR) [25j [26] of LTC sim¬ 
ply using two physical quantities, V and p, as predic¬ 
tors. These quantities are available in most of the experi¬ 
mental or computational crystal structure database, such 
as ICSD[27], Atomwork[28], Materials Project Database 
(MPD)[29], and aflowlib[30]. Although a phenomenolog¬ 
ical relationship has been proposed between log/^z, and 
V ‘2A\. the correlation between them is not so high. The 
correlation between log k>l and p is even worse. 

We start from an observed data set of 5 compounds 
that is randomly chosen from 101 compounds. In the 
kriging, a compound with maximum probability of im¬ 
provement among the remaining data is searched, namely 
a compound with the highest Z-score derived from GPR. 
The compound is included into the observed data set 
and then another compound with maximum probability 
of improvement is searched. Both the kriging and ran¬ 
dom searches are repeated fifty times and the average 
number of observed compounds required for finding the 
compound with the lowest LTC is examined. 

When — log kl is expressed as /, Z-score for a com¬ 
pound with predictors cc* is defined as 

Z{X*) = [f(x*) - /best] /V v ( x *) (!) 

where f(x*) and v(x*) denote the predicted value of 
— log kl and its prediction variance at a point expressed 
by predictors x*, respectively. v(x*) is expected to 
be small for compounds near the observed data, while 
it can be large for compounds far from the observed 
data, /best denotes the lowest LTC value among “ob¬ 
served” compounds, which is updated at each kriging 
step. Z-score that is evaluated by dividing [/(sc*) — /best] 
by the square root of the prediction variance, y/v(x*) 
tends to select candidates with maximum probability of 
improvement [31] . Here the prediction and its variance 
are described using the Gaussian kernel function. There¬ 
fore, our GPR has two free parameters, i.e. variances of 
Gaussian kernel and prior distribution. Here, they are 
given as 20 and 0.1, respectively. 

Figure [2] (a) shows the result of the kriging search in 
comparison to the random search of the lowest LTC com¬ 
pounds within the 101 compounds. The average num¬ 
bers of compounds required for the optimization using 
the kriging and random searches, 7V ave , are 11 and 55, 
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FIG. 1. LTC calculated from first principles for 101 compounds along with (a) volume, V, and (b) density, p. (c) Experimental 
LTC data are shown for comparison when experimental LTCs are available. 



Number of observations Number of observations 

FIG. 2. Lowest LTC values at each iteration in kriging search 
for finding (a) PbSe and (b) Lil. Those by random searches are 
also shown for comparison. When performing a kriging search 
for finding Lil, PbSe and PbTe are intentionally omitted and 
the rest of 99 compounds are used. 


respectively. The compound with the lowest LTC among 
the 101 compounds, i.e., rocksalt PbSe, can be found 
much more efficiently using the kriging technique and 
only with two variables, V and p. However, we realize 
that the kriging only with these two variables is not a 
robust way for finding the lowest LTC. As an example, 
Fig. [2] (b) shows the result of the kriging search using the 
dataset after intentionally removing 1st and 2nd lowest 
LTC compounds, i.e., rocksalt PbSe and PbTe, from the 
101 compounds. Then rocksalt Lil should be the right 
answer of the optimization. However, 7V ave is 65 for find¬ 
ing Lil using the kriging only with V and p, which is 
larger than that of the random search, 7V ave = 50. The 
delay of the optimization should originate from the fact 
that Lil is an outlier when LTC is modeled only with V 
and p. Such outlier compounds with low LTC are difficult 


to find only with V and p. 

In order to overcome the outlier problem, we add pre¬ 
dictors about constituent chemical elements. There are 
many choices for such variables: They are, for exam¬ 
ple, electronegativity, atomic radius, ionization energy, 
etc[26]. Here, we newly introduced “elemental descrip¬ 
tors”, which is a set of binary digits representing the 
presence of chemical elements. Since the 101 LTC data 
is composed of 34 kinds of elements, we use 34 elemental 
descriptors. Results of the kriging are shown in Figs. [2] 
(a) and (b) with 34 elemental descriptors on top of V 
and p. In both cases, the compound of the lowest LTC is 
found with N ave = 19. The use of the elemental descrip¬ 
tors is found to improve the robustness of the efficient 
search. 

As described in the Supplemental Material (SM), bet¬ 
ter correlations with LTC can be found for parameters 
that are obtained from phonon density of states. How¬ 
ever, we do not use such phonon parameters as predictors 
in the present study, because there is no data library 
available for such phonon parameters for a wide range 
of compounds. Hereafter, we show results only with the 
predictor set composed of 34 elemental descriptors on top 
of V and p. 

Screening for low LTC compounds over compounds in a 
large library is carried out using a GPR prediction model. 
Such a screening based on a prediction model is called 
“virtual screening” in biomedical communities [32l. For 
the virtual screening, we adopt all 54,779 compounds in 
MPD library 291 133]. which is composed of most of crystal 
structure data available in ICSD 27 . On the basis of the 
GPR prediction model made by V, p and 34 elemental 
descriptors for the 101 LTC data, a ranking for low LTC 
compounds is made according to the Z-score of the 54,779 
compounds. 

Figure [3] shows the distribution of Z-scores for the 
54,779 compounds along with V and p. The magni- 
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FIG. 3. Dependence of Z-score on constituent elements for 
compounds in the MPD library. The magnitude of Z-score 
is shown by colors along with volume, V , and density, p, for 
each element. 


tilde of Z-score is plotted in panels corresponding to con¬ 
stituent elements. (Transition metal and other elements 
are shown in SM.) The Z-score is relative to rocksalt 
PbSe showing the lowest LTC among 101 compounds. 
Among 54,779 compounds, 221 compounds show posi¬ 
tive Z-score, which are expected to have lower LTC than 
that of rocksalt PbSe, i.e., < 0.9 W/mK (@300 K). They 
are highlighted by red dots. They are widely distributed 
in V — p space; which means it is difficult to pick them up 
without performing the Bayesian optimization with ele¬ 
mental descriptors. The Z-score is widely distributed for 
light elements such as Li, N, O and F. This implies that 
the presence of such light elements by itself have little 
effects on lowering the LTC. When such light elements 
form a compound with heavy elements, the compound 
tends to show high Z-score. It is also noteworthy that 
many compounds composed of some light elements such 
as Be and B tend to show high LTC. 

Special features are recognized for Pb, Cs, I, Br and 
Cl. Many compounds composed of these elements ex¬ 
hibit high Z-score. (The number of compounds with pos¬ 
itive Z-score is shown in SM.) Most of compounds show¬ 
ing positive Z-score have any of atomic combinations of 
these five elements. On the other hand, elements in the 
Periodic table neighboring to these five elements do not 
show analogous trends. For example, compounds with 
high Z-scores are rarely found for T1 and Bi, which are 
neighboring to Pb. This may sound odd since Bi 2 Te 3 is a 
famous thermoelectric compound. This may be ascribed 
to our selection of the training dataset composed only 
of AB compounds with 34 elements and three kinds of 
simple crystal structures. In other words, the training 
dataset is somehow “biased”. This is unavoidable at the 
moment since the first-principles LTC calculations are 
still too expensive to obtain sufficiently unbiased train¬ 
ing dataset with a large enough number of data to cover 
the diversity of chemical composition and crystal struc¬ 



FIG. 4. Crystal structures of K 2 CdPb and Cs 2 [PdCl 4 ]l 2 
predicted to show low LTC of < 0.5 W/mK (@300 K) and 
narrow band gap of < 1 eV. 


tures. Nevertheless, the “biased” training dataset will be 
verified to be useful for finding low LTC materials. Be¬ 
cause of the use of the “biased” training dataset, we may 
not be able to discover all of the low LTC materials in 
the library. However, we can discover at least a part of 
them. 

Verification process for the candidates of low LTC com¬ 
pounds after the virtual screening is one of the most im¬ 
portant steps to “discover” low LTC compounds. First 
principles LTCs are evaluated for the top 8 compounds 
after the virtual screening. All of them are considered 
to form ordered structures. LTC calculation was unsuc¬ 
cessful for Pb 2 RbBr 5 due to the presence of imaginary 
phonon modes within the supercell used in the present 
study. Z-scores and first principles LTC of the rest of the 
compounds are listed in Table 1. All of top 5 compounds 
show LTC of < 0.2 W/mK (@300 K), which are much 
lower than that of the rocksalt PbSe, i.e., 0.9 W/mK 
(@300 K). This confirms the powerfulness of the present 
GPR prediction model for efficiently discovering low LTC 
compounds. Crystal structures of highly ranked com¬ 
pounds, PbRbI 3 , PbIBr, PbRb 4 Br 6 and Pbl 2 (P6 3 rac) 
are shown in SM. PbICl and PbCIBr have the same crys¬ 
tal structures as PbIBr. Pbl 2 (R3m) and Pbl 2 (P6 3 rac) 
are different only in their stacking sequences. All of these 
compounds contain either six-fold or eight-fold coordi¬ 
nated Pb by halogen ions, and are of stoichiometric chem¬ 
ical composition when Pb is divalent. 

When such LTC materials are considered for ther¬ 
moelectric applications, properties related to electronic 
structures, namely electronic contribution of the thermal 
conductivity, electrical conductivity and Seebeck coeffi¬ 
cient should also be optimized. Although they can be 
tuned by elemental doping, the band gap, E gi should be 
a simple measure of the electronic structure and allows to 
discriminate in a simple way between materials that can 
be good thermoelectrics or not. All of 221 compounds 
showing positive Z-score are listed in SM together with 
Eg (DFT-PBE) given in the MPD library. Among them 
only 19 compounds satisfy 0.1 < E g < 1.0 eV. First prin¬ 
ciples LTCs are evaluated for them. Crystal structures 
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TABLE I. First principles LTCs and Z-scores for highly 
ranked compounds by the virtual screening. Band gaps by 
DFT-PBE are taken from MPD library [29l 133] . 


Ranking Z 

!-score 

Formula 

Space 

group 

LTC Band 

(W/mK) gap (eV) 

1 

1.90 

PbRbls 

Pnma 

0.10 

2.46 

2 

1.76 

PbIBr 

Pnma 

0.13 

2.56 

3 

1.56 

PbRb4Br6 

R3c 

0.08 

3.90 

4 

1.56 

PbICl 

Pnma 

0.18 

2.72 

5 

1.56 

PbCIBr 

Pnma 

0.09 

3.44 

7 

1.44 

Pbl 2 

R3m 

0.29 

2.42 

8 

1.43 

Pbl 2 

PQ^mc 

0.29 

2.45 

121 

0.39 

K 2 CdPb 

Ama2 

0.45 

0.18 

144 

0.29 

Cs 2 [PdCl 4 ]l 2 

IA/mmm 

0.31 

0.88 


and LTC for two of them are shown in Fig. [4] and Ta¬ 
ble |TJ Both of K 2 CdPb and Cs 2 [PdCl 4 ]l 2 are predicted to 
exhibit LTC of less than 0.5 W/mK (@300 K) together 
with band gap of smaller than 1 eV. The discovery of 
such compounds may open a gate toward designing new 
thermoelectric materials with exceptionally high figure of 
merit. 

In this study, we first report the theoretical LTC of 101 
compounds by first-principles anharmonic lattice dynam¬ 
ics calculations. Using these data, the virtual screening 
of a library containing 54,779 compounds is performed 
by Bayesian optimization using kriging method based on 
the Gaussian process regressions. 221 materials with very 
low LTC are found from this screening. A final filtering 
of those low LTC compounds is made using the electronic 
band gap, which is a measure to discriminate in a simple 
way between materials that can be good thermoelectrics 
or not. Two compounds with low LTC of < 0.5 W/mK 
(@300K) and narrow band gap of < 1 eV are thus dis¬ 
covered, which may open a gate toward designing new 
thermoelectric materials with exceptionally high figure of 
merit. The present method should be useful for searching 
materials for many different applications in which chem¬ 
istry of materials are required to be optimized. 
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